spark_read_td_presto: Read Treasure Data data from Presto via api-presto gateway

Description Usage Arguments Details See Also Examples

View source: R/td_query.R

Description

Read Treasure Data data from Presto via api-presto gateway

Usage

1
2
spark_read_td_presto(sc, name, source, query, options = list(),
  repartition = 0, memory = TRUE, overwrite = TRUE)

Arguments

sc

A spark_connection.

name

The name to assign to the newly generated table on Spark.

source

Data base name of the table on TD. Example: "sample_datasets"

query

A SQL to execute

options

A list of strings with additional options.

repartition

The number of partitions used to distribute the generated table. Use 0 (the default) to avoid partitioning.

memory

Boolean; should the data be loaded eagerly into memory? (That is, should the table be cached?)

overwrite

Boolean; overwrite the table with the given name if it already exists?

Details

You can execute queries to TD through td-spark. You have to set spark.td.apikey, spark.serializer appropreately.

See Also

Other Spark serialization routines: spark_execute_td_presto, spark_read_td_query, spark_read_td, spark_write_td

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
library(dplyr)
config <- spark_config()

config$spark.td.apikey <- Sys.getenv("TD_API_KEY")
config$spark.serializer <- "org.apache.spark.serializer.KryoSerializer"
config$spark.sql.execution.arrow.enabled <- "true"

sc <- spark_connect(master = "local", config = config)

df <- spark_read_td_presto(sc,
  "sample",
  "sample_datasets",
  "select count(1) from www_access") %>% collect()

## End(Not run)

chezou/sparklytd documentation built on Oct. 27, 2019, 2:32 a.m.