readme.md

R package with BigQuery utility functions

Automates some common tasks in BigQuery which are not so easily achieved with the bigrquery. Besides this "cosmetic" aspect, main feature is to download large amount of data in the memory through google storage, when conventional methods become too slow.

Installation

  1. Install devtools. install.packages("devtools")
  2. Install from github: devtools::install_github("bizovi/gcphelper")
  3. Use it: library(gcphelper)

Getting Started

Authentification in the Google Cloud Platform can be a bit daunting when automating workflows which depend on multiple services. I found the following approach to work for most scenarios. So, you'll need a:

  1. Service account and service key
  2. Necessary and sufficient rights: for BigQuey and Google Storage gs://.
library(googleAuthR)

project_id <- "<my_project>"
scopes <- "https://www.googleapis.com/auth/cloud-platform"

options(googleAuthR.scopes.selected = scopes)
service_token <- googleAuthR::gar_auth_service(
    json_file = "<service_key>.json")

Main Use Cases are to save tables from queries and download data into data.frames. TBD: creating views.

library(gcphelper)

# to download the data into memory
data <- gcphelper::bq_query_gcs(
  sql = "select * from `project.dataset.table`",
  project = "<project_id>",
  use_legacy_sql = FALSE,
  quiet = FALSE,
  target_dataset = "<target_dataset>",
  target_bucket  = "<target_bucket>",
  target_directory = getwd(), # downloads data to current directory
  # give it a random job name from letters
  job_name = collapse(sample(x = LETTERS, size = 15, replace = TRUE), sep = ""),
  multi_file = FALSE)
result <- gcphelper::view_to_table(
  project_id = "<project_id>",
  view_path  = "dataset.dest_table",
  query      = "select * from `project.dataset.table`",
  dataset    = "target_dataset",
  table      = "target_table")



Bizovi/gcphelper documentation built on May 7, 2019, 8:19 a.m.