spark_read_td: Read a Treasure Data table into a Spark DataFrame

Description Usage Arguments Details See Also Examples

View source: R/spark_td.R

Description

Read a Treasure Data table into a Spark DataFrame

Usage

1
2
spark_read_td(sc, name, source, options = list(), repartition = 0,
  memory = TRUE, overwrite = TRUE)

Arguments

sc

A spark_connection.

name

The name to assign to the newly generated table on Spark.

source

Source name of the table on TD. Example: "sample_datasets.www_access"

options

A list of strings with additional options.

repartition

The number of partitions used to distribute the generated table. Use 0 (the default) to avoid partitioning.

memory

Boolean; should the data be loaded eagerly into memory? (That is, should the table be cached?)

overwrite

Boolean; overwrite the table with the given name if it already exists?

Details

You can read TD table through td-spark. You have to set spark.td.apikey, spark.serializer appropreately.

See Also

Other Spark serialization routines: spark_execute_td_presto, spark_read_td_presto, spark_read_td_query, spark_write_td

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
config <- spark_config()

config$spark.td.apikey <- Sys.getenv("TD_API_KEY")
config$spark.serializer <- "org.apache.spark.serializer.KryoSerializer"
config$spark.sql.execution.arrow.enabled <- "true"

sc <- spark_connect(master = "local", config = config)

www_access <-
  spark_read_td(
  sc,
  name = "www_access",
  source = "sample_datasets.www_access")

## End(Not run)

chezou/sparklytd documentation built on Oct. 27, 2019, 2:32 a.m.