api-perform: BigQuery jobs: perform a job

api-performR Documentation

BigQuery jobs: perform a job

Description

These functions are low-level functions designed to be used by experts. Each of these low-level functions is paired with a high-level function that you should use instead:

  • bq_perform_copy(): bq_table_copy().

  • bq_perform_query(): bq_dataset_query(), bq_project_query().

  • bq_perform_upload(): bq_table_upload().

  • bq_perform_load(): bq_table_load().

  • bq_perform_extract(): bq_table_save().

Usage

bq_perform_extract(
  x,
  destination_uris,
  destination_format = "NEWLINE_DELIMITED_JSON",
  compression = "NONE",
  ...,
  print_header = TRUE,
  billing = x$project
)

bq_perform_upload(
  x,
  values,
  fields = NULL,
  source_format = c("NEWLINE_DELIMITED_JSON", "PARQUET"),
  create_disposition = "CREATE_IF_NEEDED",
  write_disposition = "WRITE_EMPTY",
  ...,
  billing = x$project
)

bq_perform_load(
  x,
  source_uris,
  billing = x$project,
  source_format = "NEWLINE_DELIMITED_JSON",
  fields = NULL,
  nskip = 0,
  create_disposition = "CREATE_IF_NEEDED",
  write_disposition = "WRITE_EMPTY",
  ...
)

bq_perform_query(
  query,
  billing,
  ...,
  parameters = NULL,
  destination_table = NULL,
  default_dataset = NULL,
  create_disposition = "CREATE_IF_NEEDED",
  write_disposition = "WRITE_EMPTY",
  use_legacy_sql = FALSE,
  priority = "INTERACTIVE"
)

bq_perform_query_dry_run(
  query,
  billing,
  ...,
  default_dataset = NULL,
  parameters = NULL,
  use_legacy_sql = FALSE
)

bq_perform_copy(
  src,
  dest,
  create_disposition = "CREATE_IF_NEEDED",
  write_disposition = "WRITE_EMPTY",
  ...,
  billing = NULL
)

Arguments

x

A bq_table

destination_uris

A character vector of fully-qualified Google Cloud Storage URIs where the extracted table should be written. Can export up to 1 Gb of data per file. Use a wild card URI (e.g. ⁠gs://[YOUR_BUCKET]/file-name-*.json⁠) to automatically create any number of files.

destination_format

The exported file format:

  • For CSV files, specify "CSV" (Nested and repeated data is not supported).

  • For newline-delimited JSON, specify "NEWLINE_DELIMITED_JSON".

  • For Avro, specify "AVRO".

  • For parquet, specify "PARQUET".

compression

The compression type to use for exported files:

  • For CSV files: "GZIP" or "NONE".

  • For newline-delimited JSON: "GZIP" or "NONE".

  • For Avro: "DEFLATE", "SNAPPY" or "NONE".

  • For parquet: "SNAPPY", "GZIP", "ZSTD" or "NONE".

...

Additional arguments passed on to the underlying API call. snake_case names are automatically converted to camelCase.

print_header

Whether to print out a header row in the results.

billing

Identifier of project to bill.

values

Data frame of values to insert.

fields

A bq_fields specification, or something coercible to it (like a data frame). Leave as NULL to allow BigQuery to auto-detect the fields.

source_format

The format of the data files:

  • For CSV files, specify "CSV".

  • For datastore backups, specify "DATASTORE_BACKUP".

  • For newline-delimited JSON, specify "NEWLINE_DELIMITED_JSON".

  • For Avro, specify "AVRO".

  • For parquet, specify "PARQUET".

  • For orc, specify "ORC".

create_disposition

Specifies whether the job is allowed to create new tables.

The following values are supported:

  • "CREATE_IF_NEEDED": If the table does not exist, BigQuery creates the table.

  • "CREATE_NEVER": The table must already exist. If it does not, a 'notFound' error is returned in the job result.

write_disposition

Specifies the action that occurs if the destination table already exists. The following values are supported:

  • "WRITE_TRUNCATE": If the table already exists, BigQuery overwrites the table data.

  • "WRITE_APPEND": If the table already exists, BigQuery appends the data to the table.

  • "WRITE_EMPTY": If the table already exists and contains data, a 'duplicate' error is returned in the job result.

source_uris

The fully-qualified URIs that point to your data in Google Cloud.

For Google Cloud Storage URIs: Each URI can contain one ''*'“ wildcard character and it must come after the 'bucket' name. Size limits related to load jobs apply to external data sources.

For Google Cloud Bigtable URIs: Exactly one URI can be specified and it has be a fully specified and valid HTTPS URL for a Google Cloud Bigtable table. For Google Cloud Datastore backups: Exactly one URI can be specified. Also, the '*' wildcard character is not allowed.

nskip

For source_format = "CSV", the number of header rows to skip.

query

SQL query string.

parameters

Named list of parameters match to query parameters. Parameter x will be matched to placeholder ⁠@x⁠.

Generally, you can supply R vectors and they will be automatically converted to the correct type. If you need greater control, you can call bq_param_scalar() or bq_param_array() explicitly.

See https://cloud.google.com/bigquery/docs/parameterized-queries for more details.

destination_table

A bq_table where results should be stored. If not supplied, results will be saved to a temporary table that lives in a special dataset. You must supply this parameter for large queries (> 128 MB compressed).

default_dataset

A bq_dataset used to automatically qualify table names.

use_legacy_sql

If TRUE will use BigQuery's legacy SQL format.

priority

Specifies a priority for the query. Possible values include "INTERACTIVE" and "BATCH". Batch queries do not start immediately, but are not rate-limited in the same way as interactive queries.

Value

A bq_job.

Google BigQuery API documentation

Additional information at:

Examples


ds <- bq_test_dataset()
bq_mtcars <- bq_table(ds, "mtcars")
job <- bq_perform_upload(bq_mtcars, mtcars)
bq_table_exists(bq_mtcars)

bq_job_wait(job)
bq_table_exists(bq_mtcars)
head(bq_table_download(bq_mtcars))


rstats-db/bigrquery documentation built on Aug. 17, 2024, 11:33 p.m.