bq_table_download | R Documentation |
This function provides two ways to download data from BigQuery, transfering
data using either JSON or arrow, depending on the api
argument. If
bigrquerystorage is installed, api = "arrow"
will be used (because it's
so much faster, but see the limitions below), otherwise you can select
deliberately by using api = "json"
or api = "arrow"
.
The arrow API is much faster, but has heavier dependencies: bigrquerystorage requires the arrow package, which can be tricky to compile on Linux (but you usually should be able to get a binary from Posit Public Package Manager.
There's one known limitation of api = "arrow"
: when querying public data,
you'll now need to provide a billing
project.
The JSON API retrieves rows in chunks of page_size
. It is most suitable
for results of smaller queries (<100 MB, say). Unfortunately due to
limitations in the BigQuery API, you may need to vary this parameter
depending on the complexity of the underlying data.
The JSON API will convert nested and repeated columns in to list-columns as follows:
Repeated values (arrays) will become a list-column of vectors.
Records will become list-columns of named lists.
Repeated records will become list-columns of data frames.
bq_table_download(
x,
n_max = Inf,
page_size = NULL,
start_index = 0L,
max_connections = 6L,
quiet = NA,
bigint = c("integer", "integer64", "numeric", "character"),
api = c("json", "arrow"),
billing = x$project,
max_results = deprecated()
)
x |
A bq_table |
n_max |
Maximum number of results to retrieve. Use |
page_size |
(JSON only) The number of rows requested per chunk. It is
recommended to leave this unspecified until you have evidence that the
When |
start_index |
(JSON only) Starting row index (zero-based). |
max_connections |
(JSON only) Number of maximum simultaneous connections to BigQuery servers. |
quiet |
If |
bigint |
The R type that BigQuery's 64-bit integer types should be
mapped to. The default is |
api |
Which API to use? The Because the |
billing |
(Arrow only) Project to bill; defaults to the project of |
max_results |
Because data retrieval may generate list-columns and the data.frame
print method can have problems with list-columns, this method returns
a tibble. If you need a data.frame
, coerce the results with
as.data.frame()
.
df <- bq_table_download("publicdata.samples.natality", n_max = 35000, billing = bq_test_project())
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.