to Arm Treasure Data with sparklyr

Description Usage Arguments Details See Also Examples

View source: R/td_query.R

Read Treasure Data data from Presto via api-presto gateway

1 2	spark_read_td_presto(sc, name, source, query, options = list(), repartition = 0, memory = TRUE, overwrite = TRUE)

`sc`	A `spark_connection`.
`name`	The name to assign to the newly generated table on Spark.
`source`	Data base name of the table on TD. Example: "sample_datasets"
`query`	A SQL to execute
`options`	A list of strings with additional options.
`repartition`	The number of partitions used to distribute the generated table. Use 0 (the default) to avoid partitioning.
`memory`	Boolean; should the data be loaded eagerly into memory? (That is, should the table be cached?)
`overwrite`	Boolean; overwrite the table with the given name if it already exists?

You can execute queries to TD through td-spark. You have to set spark.td.apikey, spark.serializer appropreately.

Other Spark serialization routines: spark_execute_td_presto, spark_read_td_query, spark_read_td, spark_write_td

## Not run: 
library(dplyr)
config <- spark_config()

config$spark.td.apikey <- Sys.getenv("TD_API_KEY")
config$spark.serializer <- "org.apache.spark.serializer.KryoSerializer"
config$spark.sql.execution.arrow.enabled <- "true"

sc <- spark_connect(master = "local", config = config)

df <- spark_read_td_presto(sc,
  "sample",
  "sample_datasets",
  "select count(1) from www_access") %>% collect()

## End(Not run)